Goto

Collaborating Authors

 false alarm rate


Bandit Quickest Changepoint Detection

Neural Information Processing Systems

Surveillance systems [HC11] are equipped with a suite of sensors that can be switched and steered to focus attention on any target or location over a physical landscape (see Figure 1) to detect abrupt changes at any location. On the other hand, sensor suites are resource limited, and only a limited subset, among all the locations, can be probed at any time.



E-valuator: Reliable Agent Verifiers with Sequential Hypothesis Testing

Sadhuka, Shuvom, Prinster, Drew, Fannjiang, Clara, Scalia, Gabriele, Regev, Aviv, Wang, Hanchen

arXiv.org Machine Learning

Agentic AI systems execute a sequence of actions, such as reasoning steps or tool calls, in response to a user prompt. To evaluate the success of their trajectories, researchers have developed verifiers, such as LLM judges and process-reward models, to score the quality of each action in an agent's trajectory. Although these heuristic scores can be informative, there are no guarantees of correctness when used to decide whether an agent will yield a successful output. Here, we introduce e-valuator, a method to convert any black-box verifier score into a decision rule with provable control of false alarm rates. We frame the problem of distinguishing successful trajectories (that is, a sequence of actions that will lead to a correct response to the user's prompt) and unsuccessful trajectories as a sequential hypothesis testing problem. E-valuator builds on tools from e-processes to develop a sequential hypothesis test that remains statistically valid at every step of an agent's trajectory, enabling online monitoring of agents over arbitrarily long sequences of actions. Empirically, we demonstrate that e-valuator provides greater statistical power and better false alarm rate control than other strategies across six datasets and three agents. We additionally show that e-valuator can be used for to quickly terminate problematic trajectories and save tokens. Together, e-valuator provides a lightweight, model-agnostic framework that converts verifier heuristics into decisions rules with statistical guarantees, enabling the deployment of more reliable agentic systems.


A Kolmogorov-Arnold Network for Interpretable Cyberattack Detection in AGC Systems

Jilan, Jehad, Nambiar, Niranjana Naveen, Saber, Ahmad Mohammad, Paranjape, Alok, Youssef, Amr, Kundur, Deepa

arXiv.org Artificial Intelligence

Automatic Generation Control (AGC) is essential for power grid stability but remains vulnerable to stealthy cyberattacks, such as False Data Injection Attacks (FDIAs), which can disturb the system's stability while evading traditional detection methods. Unlike previous works that relied on black-box approaches, this work proposes Kolmogorov-Arnold Networks (KAN) as an interpretable and accurate method for FDIA detection in AGC systems, considering the system nonlinearities. KAN models include a method for extracting symbolic equations, and are thus able to provide more interpretability than the majority of machine learning models. The proposed KAN is trained offline to learn the complex nonlinear relationships between the AGC measurements under different operating scenarios. After training, symbolic formulas that describe the trained model's behavior can be extracted and leveraged, greatly enhancing interpretability. Our findings confirm that the proposed KAN model achieves FDIA detection rates of up to 95.97% and 95.9% for the initial model and the symbolic formula, respectively, with a low false alarm rate, offering a reliable approach to enhancing AGC cybersecurity.


Principled Detection of Hallucinations in Large Language Models via Multiple Testing

Li, Jiawei, Magesh, Akshayaa, Veeravalli, Venugopal V.

arXiv.org Artificial Intelligence

While Large Language Models (LLMs) have emerged as powerful foundational models to solve a variety of tasks, they have also been shown to be prone to hallucinations, i.e., generating responses that sound confident but are actually incorrect or even nonsensical. In this work, we formulate the problem of detecting hallucinations as a hypothesis testing problem and draw parallels to the problem of out-of-distribution detection in machine learning models. We propose a multiple-testing-inspired method to solve the hallucination detection problem, and provide extensive experimental results to validate the robustness of our approach against state-of-the-art methods.



Bandit Quickest Changepoint Detection

Neural Information Processing Systems

Surveillance systems [HC11] are equipped with a suite of sensors that can be switched and steered to focus attention on any target or location over a physical landscape (see Figure 1) to detect abrupt changes at any location. On the other hand, sensor suites are resource limited, and only a limited subset, among all the locations, can be probed at any time.


On Continuous Monitoring of Risk Violations under Unknown Shift

Timans, Alexander, Verma, Rajeev, Nalisnick, Eric, Naesseth, Christian A.

arXiv.org Machine Learning

Machine learning systems deployed in the real world must operate under dynamic and often unpredictable distribution shifts. This challenges the validity of statistical safety assurances on the system's risk established beforehand. Common risk control frameworks rely on fixed assumptions and lack mechanisms to continuously monitor deployment reliability. In this work, we propose a general framework for the real-time monitoring of risk violations in evolving data streams. Leveraging the 'testing by betting' paradigm, we propose a sequential hypothesis testing procedure to detect violations of bounded risks associated with the model's decision-making mechanism, while ensuring control on the false alarm rate. Our method operates under minimal assumptions on the nature of encountered shifts, rendering it broadly applicable. We illustrate the effectiveness of our approach by monitoring risks in outlier detection and set prediction under a variety of shifts.


Improving Neural Diarization through Speaker Attribute Attractors and Local Dependency Modeling

Palzer, David, Maciejewski, Matthew, Fosler-Lussier, Eric

arXiv.org Artificial Intelligence

ABSTRACT In recent years, end-to-end approaches have made notable progress in addressing the challenge of speaker diarization, which involves segmenting and identifying speakers in multi-talker recordings. One such approach, Encoder-Decoder Attractors (EDA), has been proposed to handle variable speaker counts as well as better guide the network during training. In this study, we extend the attractor paradigm by moving beyond direct speaker modeling and instead focus on representing more detailed'speaker attributes' through a multistage process of intermediate representations. Additionally, we enhance the architecture by replacing transformers with conformers, a convolution-augmented transformer, to model local dependencies. Experiments demonstrate improved di-arization performance on the CALLHOME dataset.


TransformDAS: Mapping {\Phi}-OTDR Signals to Riemannian Manifold for Robust Classification

Kang, Jiaju, Han, Puyu, Chun, Yang, Wang, Xu, Gong, Luqi

arXiv.org Artificial Intelligence

Phase-sensitive optical time-domain reflectometry ({\Phi}-OTDR) is a widely used distributed fiber optic sensing system in engineering. Machine learning algorithms for {\Phi}-OTDR event classification require high volumes and quality of datasets; however, high-quality datasets are currently extremely scarce in the field, leading to a lack of robustness in models, which is manifested by higher false alarm rates in real-world scenarios. One promising approach to address this issue is to augment existing data using generative models combined with a small amount of real-world data. We explored mapping both {\Phi}-OTDR features in a GAN-based generative pipeline and signal features in a Transformer classifier to hyperbolic space to seek more effective model generalization. The results indicate that state-of-the-art models exhibit stronger generalization performance and lower false alarm rates in real-world scenarios when trained on augmented datasets. TransformDAS, in particular, demonstrates the best classification performance, highlighting the benefits of Riemannian manifold mapping in {\Phi}-OTDR data generation and model classification.